Xenophones: An investigation of phone set expansion in Swedish and implications for speech recognition and speech synthesis

نویسندگان

  • Robert Eklund
  • Anders Lindström
چکیده

In recent years, both automatic speech recognition (ASR) and text-to-speech (TTS) conversion systems have attained quality levels that allow inclusion in everyday applications. One remaining problem to be solved in both these types of applications is that alleged phone inventories of specific languages are commonly expanded with phones from other languages, a problem that becomes the more acute in an increasingly internationalized world where multilingual automatic speech-based services are a desideratum. This paper investigates the nature of phone set expansion in Swedish. The status of these phones is discussed, and since such added phones do not have a phonemic (or allophonic) function, the term 'xenophones' is suggested. The analysis is based on a production study involving 491 subjects, and the observed xenophonic expansion is described in terms of three categories along the " awareness " and the " fidelity " dimensions. The results show that very few subjects resort to full rephonematization and that xenophonic expansion is the rule, although there is an uneven distribution depending on particular phones, spanning from phones produced by most subjects, to phones produced by almost no subjects. Of the possible explanatory factors analyzed—regional background, gender, age and educational level—the latter is by far the most important. sie keine phonematische (oder allophonische) Funktion haben, wird der Terminus " Xenophon " vorgeschlagen. Die Analyse gründet sich auf eine Produktionsstudie mit 491 Informanten, und der beobachtete Xenophonausbau wird in drei Kategorien entlang der Dimensionen " Bewusstheit " und " Getreue " beschrieben. Die Ergebnisse zeigen, dass sehr wenige Informanten auf vollständige Rephonematisierung zurückgreifen und dass Xenophonausbau die Regel ist, aber die Verbreitung der einzelnen Phone ist ungleichmäßig und erstreckt sich von solchen, die die meisten bis zu solchen, die fast keine Sprecher produzieren. Von den möglichen Erklärungsfaktoren, die analysiert wurden – regionale Herkunft, Geschlecht, Alter und Bildungsgrad, ist der letztere bei weitem der wichtigste. 4 Resumé Au cours de ces dernières années, les systèmes de reconnaissance automatique de la parole ainsi que les systèmes de synthèse de la parole à partir du texte ont atteint une qualité leur permettant d'être intégrés à des applications quotidiennes. Un problème demeure cependant quant à ces applications, à savoir, l'expansion fréquente des inventaires de phones d'une langue spécifique à des phones issus d'autres langues, problème d'autant plus important dans un monde où les services automatisés de la parole multilingues sont un souhait. Cet article examine la nature de l'expansion des phones …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

[jɑːmes] or [dʒɛɪmz] or Perhaps Something In- between? Recapping Three Years of Xenophone Studies

This paper summarises work on ‘xenophones’ (foreign sounds) carried out at Telia Research. The inclusion of “foreign” sounds in Swedish is described, as well as their implications on speech recognition and speech synthesis. Results from two earlier studies are summarised and described: the nature of the expansion of what is normally regarded as the Swedish phone set, and the nature of some poss...

متن کامل

How foreign are “foreign” speech sounds? Implications for speech recognition and speech synthesis

This paper reports results from a production study which shows in what ways the traditional Swedish phone set is expanded with phones similar to or approximating phones from other languages than Swedish in everyday speech. The inclusion of such sounds – here called xenophones – has implications for both automatic speech recognition and speech synthesis systems, especially in polylingual environ...

متن کامل

How to handle "foreign" sounds in Swedish text-to-speech conversion: approaching the 'xenophone' problem

This paper discusses the problem of handling “foreign” speech sounds in Swedish speech technology systems, in particular speech synthesis. A production study is made, where it is shown that Swedish speakers add foreign speech sounds, here termed ‘xenophones’, to their phone repertoire when reading Swedish sentences with embedded English names and words. As a result of the observations, the phon...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Speech Communication

دوره 35  شماره 

صفحات  -

تاریخ انتشار 2001